Features for automatic discourse analysis of paragraphs
نویسندگان
چکیده
In this paper, we investigate which information is useful for the detection of rhetorical (RST) relations between (Multi-) Sentential Discourse Units ((M -)SDUs)-text spans consisting of one or more sentences-within the same paragraph. In order to do so, we simplified the task of discourse parsing to a decision problem in which we decided whether an (M-)SDU is either rhetorically related to a preceding or a following (M-)SDU. Employing the RST Treebank (Carlson et al. 2003), we offered this choice to machine learning algorithms to gether with syntactic, lexical, referential, discourse and surface features. Next, the features were ranked on the basis of (1) models established by the classification algorithms and (2) feature selection metrics. Highly ranked features that predict the presence of a rhetorical relation are syntactic similarity, word overlap, word similarity, continuous punctuation and many reference features. Other features are used to introduce new topics or arguments: time references, proper nouns, definite articles and the word further.
منابع مشابه
Language Complexity, Accuracy and Fluency in Different Types of Writing Paragraph: Do the Raters Notice Such Effect
The aim of the present study was to investigate the effects of two types of paragraph on EFL learners’ written production. It addressed the issue of how three aspects of language production (i.e. complexity, accuracy, and fluency) vary among two types of paragraphs (i.e. paragraphs of chronology and cause-effect) written by EFL learners. Thirty intermediate level learners of English participate...
متن کاملAutomatic Paragraph Segmentation with Lexical and Prosodic Features
As long-form spoken documents become more ubiquitous in everyday life, so does the need for automatic discourse segmentation in spoken language processing tasks. Although previous work has focused on broad topic segmentation, detection of finer-grained discourse units, such as paragraphs, is highly desirable for presenting and analyzing spoken content. To better understand how different aspects...
متن کاملA Critical Discourse Analysis of the Event of September 11, 2001 in American and Syrian Print Media Discourse
Aiming at highlighting the important role of print media discourse in the implicit transfer of the dominant ideology of discourse context, the present data-driven paper demonstrates how the lexical features of repetition and synonymy as well as the structural and thematic features of passivization, nominalization and predicated theme were utilized by the discourse producers to mediate betwee...
متن کاملAutomatic Prostate Cancer Segmentation Using Kinetic Analysis in Dynamic Contrast-Enhanced MRI
Background: Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) provides functional information on the microcirculation in tissues by analyzing the enhancement kinetics which can be used as biomarkers for prostate lesions detection and characterization.Objective: The purpose of this study is to investigate spatiotemporal patterns of tumors by extracting semi-quantitative as well as w...
متن کاملFree Model of Sentence Classifier for Automatic Extraction of Topic Sentences
This research employs free model that uses only sentential features without paragraph context to extract topic sentences of a paragraph. For finding optimal combination of features, corpus-based classification is used for constructing a sentence classifier as the model. The sentence classifier is trained by using Support Vector Machine (SVM). The experiment shows that position and meta-discours...
متن کامل